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Abstract. The D0 experiment faces many challenges enabling access to large datasets for physicists on 
four continents. The new concepts for distributed large scale computing implemented in D0 aim for an 
optimal use of the available computing resources while minimising the person-power needed for operation. 
The real live test of these concepts is of special interest for the LHC Computing GRID, LCG, which follows 
a similar strategy. 
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1 Introduction 

Some of the most interesting events in pp collisions at the 
Tevatron are very difficult to distinguish from the over- 
whelming QCD background. Top and Higgs particles are 
moreover very rare. The Tevatron experiments D0 and 
CDF therefore record large amount of data for later anal- 
ysis and detailed studies. While taking data each experi- 
ment records roughly 500 GB of raw data per day. Recon- 
structing these events adds another l.lTB/day in D0. 
Within the last two years around 350 TB have been accu- 
mulated. Providing this data volume for physics analysis 
performed by more than 100 people is one of the challenges 
D0 faces. 

With this amount of data providing the necessary over- 
all IO rate is a difficult task. As only a fraction of the 
data can be stored on disks, tape mounting leads to ma- 
jor dead-times. D0 follows a combined concept of locally 
optimising the resource usage and distributing the data 
globally. For an international collaboration like D0 with 
around 50% of the collaborators stemming from non-US 
institutes the second step is of special importance to pro- 
vide easy data access not only for those resident near Fer- 
milab but also to those working remotely often on another 
continent. 

In addition, to analyse these data, sufficiently many 
simulated events need to be produced for selection studies 
and the estimation of detector effects. To fulfil the requests 
the production is distributed to many sites. Production 
chains with ever changing versions and parameters are 
however complicated to handle. To case the production an 
automatic handling of large batches of jobs was developed. 

The mentioned concepts to meet the outlined require- 
ments are discussed in the following. 



2 Management of large batches of jobs 

To reduce the person-power needed for Monte Carlo pro- 
duction with its ever changing versions and parameter set- 
tings, DO developed the work-flow management system 
Run j ob rj|2] • A work which is continued in collaboration 
with CMS 0. 

It automates the linking of several processing tasks 
into a single job based on a job description and allows to 
separate the configuration of the individual programs from 
the definition of the work-flow (Fig. flj . Thus the individ- 
ual steps of Monte Carlo production, like event genera- 
tion, detector simulation and reconstruction, can be con- 
figured by the corresponding experts. Those performing 
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Fig. 1. Run job separates the details of how to call individual 
programs (A, B and C) from the operational details of which 
programs (or program versions) to combine. 
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the production can in turn concentrate on the work-flow, 
i.e. which program versions to combine. 

Furthermore, the coherent description of the produc- 
tion work-flow allows the preservation of the meta-data 
describing what actually was run during the production. 
D0 stores these meta-data along with the produced sim- 
ulated data in the SAM-system described below. 

Recently the work-flow description was extended to al- 
low for job parallelisation. This way a single production 
can be run in many parallel jobs at a given site. Rvmjob 
takes care about the necessary initialisation and termina- 
tion tasks, e.g. the combination of results of those many 
parallel jobs. 



3 Optimised use of local resources 



Beside person-power the optimised use of computing re- 
sources is important to fully exploit the physics potentials 
of D0. With the given amount simulated and real data 
only a fraction of the available information can be kept 
on disks. All other data are stored within tape robots. 

To avoid inefficiencies due to tape mounting the access 
to data needs to be optimised. D0 has developed a data 
management system (named Sequential Access through 
Meta-data, SAM A.) which exploits that the order of the 
stored physics events is of no interest for the analyses: 
Instead of looping over a given list of files the user requests 
a dataset. The order in which the files belonging to this 
dataset are processed is optimised to minimise the overall 
number of tape mounts. The definition of a dataset can be 
performed by the individual users based on the meta-data 
which describe the content of the files. 

SAM also includes user-level bookkeeping and is a cen- 
tral component of the D0 software environment. It has 
recently been adopted by CDF. 



4 Worldwide distribution of data 



Even after optimising local resource utilisation and tape 
mounting, the access to the data is a bottleneck. To fur- 
ther improve the accessibility of data especially for physics 
analysis in summer 2002 a scheme for data distribution 
was outlined ; 3J (Fig. EJ. Beside the central data repos- 
itory, which holds all D0-data, several regional centres 
should hold copies of the data used for analysis and pro- 
vide them to users of their region. 

These regional centres in addition serve the institu- 
tions in that region such that collaborators in that in- 
stitutes can develop and test their analysis at their local 
computer cluster or even their desktops. 

With the tree structure D0 hopes to minimise the nec- 
essary amount of data copies over long distances. Beside 
storage such regional centres should provide a reasonable 
amount of CPU power to analyse the stored data. 
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Fig. 2. Data distribution scheme within D0. The distribution 
of data is done in a tree structure from the central repository 
(CAC) to regional analysis centres (RACs), further to the in- 
stitutional centres (IACs) and finally to the physicists desktop 
(DAS). 



4.1 Prototype 

To test the concept of regional centres which serve the as- 
sociated institutes, a prototype was set up at the German 
Grid Computing Centre in Karlsruhe (GridKa) 6 with 
the five (now six) German institutes associated. 

With this prototype the main parameters of the sys- 
tem should be tested: The required network bandwidth 
and the manpower to run the regional centre. Moreover 
shortcomings of the D0 software should be discovered. 

GridKa was established in 2002 and is rapidly growing. 
The centre is currently shared by 8 HEP experiments. Its 
(at the time) roughly 180 compute nodes are set up with 
NFS-shared user home directories and NFS-shared exper- 
iment specific disk areas. No experiment specific code can 
be set up on the compute nodes. To allow for pre-Grid 
usage of the system, each experiment has a so called soft- 
ware server, on which experiment specific software can be 
installed and which serves for user login. These specifica- 
tions arc quite different from the setup used on the systems 
at FNAL, which are exclusively used by D0. 

Unfortunately, the required changes to the D0-software 
couldn't, in all cases, be implemented in a site indepen- 
dent way, such that an adaption of each version and to 
each computing cluster is still necessary. To avoid these 
tasks in the future a standard D0 computing environ- 
ment needs to be defined which is flexible enough to deal 
with all possible cluster configurations. 

The network bandwidth to Fermilab is sufficient to 
continuously download the most condensed data format 
(Thumbnails). When requesting large datasets in one go 
transport rates of around 3.3MB/s (8TB/day) are ob- 
served. 

All changes required for doing D0 analysis at GridKa 
were available by end of 2002, with the exception of lumi- 
nosity access. During January several German collabora- 
tors used the additional resources to finish their analysis 
for Moriond in time. With this successful end-user test the 
value of regional centres was finally proved. 
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The experiences with this prototype show, that beside 
the technical achievements, it is of great advantage for the 
users to work on a centre in their own time-zone, where 
user problems can be solved by the operators during the 
usual working hours. 



With these concepts D0 profits from additional, better 
exploited resources and reduced operation tasks. At the 
same time many of the distributed computing concepts 
foreseen for LCG ^2] are tested in a real live environment, 
which is of strong interest for the preparation of the LHC 
experiments. 



5 A GRID for D0 

While the regional analysis centre in Germany has proven 
its value, the effort that needs to be taken by the user is 
still large. The analysis code needs to be copied manually 
to GridKa and needs to be recompiled on the head node 
before it can be run. This procedure currently needs to 
be repeated for each site that shall be used. Moreover the 
user needs to track the availability and performance at 
different sites in order to make even a reasonable choice 
about where to run his/her project. 

To allow for an automatic dynamic adaption to the 
actual situation GRID tools are required. The JIM project 
|7IIHII??lll()j is an integration of existing Globus fjl] tools 
with the SAM system g] used by both D0 and CDF. 
JIM uses Condor fill as its resource broker (Fig. |3J). 

An initial version of JIM has been installed at several 
sites within D0 including the GridKa cluster. First expe- 
riences with physics analyses in a GRID like environment 
are expected in the near future. 



6 Summary 
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To maximise the physics output D0 aims to optimise 
the use of its resources while reducing the person-power 
needed to maintain operation. 

The data management system SAM provides user-level 
bookkeeping and optimises the use of storage resources. 
World wide distribution of data through regional centres 
to individual institutions adds additional resources for stan- 
dard tasks like physics analyses, Monte Carlo production 
or data (re)processing. JIM integrates of Globus and Con- 
dor based GRID tools to reach a coherent access to the 
globally distributed resources and to allow for global opti- 
misation of their usage. Runj ob eases the handling of large 
batches of jobs. 




Fig. 3. Components diagram of the SAM Grid project. 



